This project is a personal exploration of quantitative conservation management using publicly available data from the California Department of Fish and Wildlife (CDFW). The primary goals are to visualize and analyze fish population data from the Salton Sea to understand species trends, habitat use, and catch-per-unit-effort (CPUE). This work is meant for learning, data visualization, and sharing insights — not for official reporting.
⸻
1. Track population changes over time for each fish species.
2. Identify habitat types where each species is most abundant.
3. Calculate overall and species-specific CPUE.
4. Visualize fish abundance data spatially across different sampling sites.
⸻
The Salton Sea is California’s largest inland waterbody, vital to fish populations and migratory birds on the Pacific Flyway. However, the ecosystem is under stress due to declining water levels, increasing salinity, and reduced inflows, impacting both wildlife and air quality.
In response, the CDFW initiated a long-term fish monitoring program in 2003, sampling 14 stations quarterly across three habitat types: pelagic (open water), near-shore, and estuarine zones. Sampling is conducted with gill nets deployed for approximately 24 hours per site, with fish identified, counted, and measured for key biological indicators (length, weight, sex, condition, etc.).
• Some seasonal data are missing (e.g., Fall and Winter 2007) due to launch site inaccessibility.
• Initial deep-water sites were eliminated after year one due to zero catch results.
• Sampling dates can vary; seasonal boundaries may not align cleanly with calendar months.
• Anomalous data (e.g., Oct 13, 2004) were re-sampled due to extreme weather-driven changes.
• The primary goal of data collection is to monitor presence/absence and trends in fish populations.
⸻
• CPUE calculations and fish abundance data may contain outliers or missing entries due to weather or equipment issues.
• Data should be interpreted with the context of biological and environmental variability in mind.
• This dataset complements the Quarterly Water Quality Surveys - Salton Sea [ds429].
⸻
This dataset is licensed under Creative Commons Attribution 4.0 International License. Attribution per CDFW BIOS citation standards satisfies the licensing requirements.
Disclaimer: The State of California provides this data without guarantees of accuracy or completeness. Use at your own discretion.
Background Information on the Salton Sea
This data came from Quarterly Fishery Surveys - Salton Sea [ds428]
# Load in libraries
library("ggplot2") # For graphing
library("tidyverse") # For data wrangling
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ lubridate 1.9.4 ✔ tibble 3.2.1
## ✔ purrr 1.0.4 ✔ tidyr 1.3.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library("readr") # For reading the comma separated values file (csv.)
library("sf") # For reading the shape file (shp.)
## Linking to GEOS 3.13.0, GDAL 3.8.5, PROJ 9.5.1; sf_use_s2() is TRUE
library("ggspatial") # For graphing spatial data
library("prettymapr") # For making pretty map graphics and annotations
library("ggrepel") # For adjusting the annotations in the map
library("ggbreak") # For creating breaks in the x or y axes
## ggbreak v0.1.4 Learn more at https://yulab-smu.top/
##
##
## If you use ggbreak in published research, please cite the following
## paper:
##
## S Xu, M Chen, T Feng, L Zhan, L Zhou, G Yu. Use ggbreak to effectively
## utilize plotting space to deal with large datasets and outliers.
## Frontiers in Genetics. 2021, 12:774846. doi: 10.3389/fgene.2021.774846
library("patchwork") # For creating breaks in the x or y axes
library("gridExtra") # For plotting multiple graphs into one figure
##
## Attaching package: 'gridExtra'
##
## The following object is masked from 'package:dplyr':
##
## combine
library("grid") # For making a common x and y axis for the grouped graph
library("klippy") # For making the code easily copied from the HTML page
# Set working directory
setwd("~/Desktop/Universal Folder/Data Analysis Projects/Salton Sea [ds429]/Quarterly_Fishery_Surveys_-_Salton_Sea")
dat <- read_csv("Quarterly_Fishery_Surveys_-_Salton_Sea_(ds428).csv")
## Rows: 409 Columns: 16
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (6): PULL_DATE, SITE, HABITAT_TY, OTHER, COMMENTS, REPORT
## dbl (10): X, Y, OBJECTID, NET_HRS, TILAPIA, CORVINA, SARGO, CROAKER, UTM_E, ...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(dat)
## # A tibble: 6 × 16
## X Y OBJECTID PULL_DATE SITE HABITAT_TY NET_HRS TILAPIA OTHER
## <dbl> <dbl> <dbl> <chr> <chr> <chr> <dbl> <dbl> <chr>
## 1 -12906353. 3964287. 1 2008/04/0… Nort… Near-shore 24.2 1453 0
## 2 -12918979. 3962129. 2 2008/04/0… Whit… Estuarine 24 886 0
## 3 -12895737. 3952891. 3 2008/04/1… Bat … Near-shore 23.5 610 0
## 4 -12917368. 3952583. 4 2008/04/1… Dese… Near-shore 24 262 0
## 5 -12908344. 3941190. 5 2008/04/1… The … Near-shore 24.2 778 0
## 6 -12886715. 3942740. 6 2008/04/1… The … Near-shore 24 524 0
## # ℹ 7 more variables: CORVINA <dbl>, SARGO <dbl>, CROAKER <dbl>,
## # COMMENTS <chr>, UTM_E <dbl>, UTM_N <dbl>, REPORT <chr>
# Display the data type of each column within the data.
sapply(dat, class)
## X Y OBJECTID PULL_DATE SITE HABITAT_TY
## "numeric" "numeric" "numeric" "character" "character" "character"
## NET_HRS TILAPIA OTHER CORVINA SARGO CROAKER
## "numeric" "numeric" "character" "numeric" "numeric" "numeric"
## COMMENTS UTM_E UTM_N REPORT
## "character" "numeric" "numeric" "character"
# Adding in the date column from the PULL_DATE column in a year month day hour minute second format.
dat <- dat %>%
mutate(Date = as.Date(ymd_hms(dat$PULL_DATE)))
long_dat <- pivot_longer(dat, cols = c(TILAPIA, CORVINA, SARGO, CROAKER), values_to = "Fish_Abundance", names_to = "Species")
head(long_dat)
## # A tibble: 6 × 15
## X Y OBJECTID PULL_DATE SITE HABITAT_TY NET_HRS OTHER COMMENTS
## <dbl> <dbl> <dbl> <chr> <chr> <chr> <dbl> <chr> <chr>
## 1 -12906353. 3964287. 1 2008/04/… Nort… Near-shore 24.2 0 <NA>
## 2 -12906353. 3964287. 1 2008/04/… Nort… Near-shore 24.2 0 <NA>
## 3 -12906353. 3964287. 1 2008/04/… Nort… Near-shore 24.2 0 <NA>
## 4 -12906353. 3964287. 1 2008/04/… Nort… Near-shore 24.2 0 <NA>
## 5 -12918979. 3962129. 2 2008/04/… Whit… Estuarine 24 0 <NA>
## 6 -12918979. 3962129. 2 2008/04/… Whit… Estuarine 24 0 <NA>
## # ℹ 6 more variables: UTM_E <dbl>, UTM_N <dbl>, REPORT <chr>, Date <date>,
## # Species <chr>, Fish_Abundance <dbl>
Fish_Stats <- long_dat %>%
group_by(Species) %>%
summarise(total = sum(Fish_Abundance),
mean = round(mean(Fish_Abundance), 3),
min = min(Fish_Abundance),
max = max(Fish_Abundance),
sd = sd(Fish_Abundance))
Fish_Stats
## # A tibble: 4 × 6
## Species total mean min max sd
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 CORVINA 19 0.046 0 5 0.377
## 2 CROAKER 180 0.44 0 54 3.36
## 3 SARGO 0 0 0 0 0
## 4 TILAPIA 55613 136. 0 2004 278.
Other_Fish <- dat %>%
filter(!grepl("0", OTHER)) %>% # Filter outs the "0" characters.
select(OTHER)
Other_Fish
## # A tibble: 17 × 1
## OTHER
## <chr>
## 1 2 Mollies
## 2 1 Mullet
## 3 1 Grebe
## 4 2 Shad
## 5 1 Shad
## 6 2 Grebes
## 7 1 Shad
## 8 1 Pupfish
## 9 2 pupfish
## 10 1 St. bass
## 11 1 Molly
## 12 1 Grebe
## 13 1 Grebe
## 14 1 St. Bass
## 15 1 Shad
## 16 1 Pupfish, 1 Shad
## 17 1 Shad
ggplot(Fish_Stats, aes(x = Species, y = total, fill = Species)) +
geom_col() +
geom_errorbar(aes(ymax = total + sd, # Adding error bars
ymin = total - sd),
width = 0.70) +
scale_y_break(c(200, 5000), scales = 1) + # Break in the y-axis
theme_bw() +
ggtitle("Cumulative Abundance of Each Fish Species \n2003 to 2008") +
ylab("Abundance of Fish")
There were no sargo fish found during this survey. To better focus on species that were observed the “SARGO” column will be removed for visualization purposes.
long_dat <- long_dat %>%
filter(!grepl("SARGO", Species))
long_dat
## # A tibble: 1,227 × 15
## X Y OBJECTID PULL_DATE SITE HABITAT_TY NET_HRS OTHER COMMENTS
## <dbl> <dbl> <dbl> <chr> <chr> <chr> <dbl> <chr> <chr>
## 1 -12906353. 3.96e6 1 2008/04/… Nort… Near-shore 24.2 0 <NA>
## 2 -12906353. 3.96e6 1 2008/04/… Nort… Near-shore 24.2 0 <NA>
## 3 -12906353. 3.96e6 1 2008/04/… Nort… Near-shore 24.2 0 <NA>
## 4 -12918979. 3.96e6 2 2008/04/… Whit… Estuarine 24 0 <NA>
## 5 -12918979. 3.96e6 2 2008/04/… Whit… Estuarine 24 0 <NA>
## 6 -12918979. 3.96e6 2 2008/04/… Whit… Estuarine 24 0 <NA>
## 7 -12895737. 3.95e6 3 2008/04/… Bat … Near-shore 23.5 0 <NA>
## 8 -12895737. 3.95e6 3 2008/04/… Bat … Near-shore 23.5 0 <NA>
## 9 -12895737. 3.95e6 3 2008/04/… Bat … Near-shore 23.5 0 <NA>
## 10 -12917368. 3.95e6 4 2008/04/… Dese… Near-shore 24 0 <NA>
## # ℹ 1,217 more rows
## # ℹ 6 more variables: UTM_E <dbl>, UTM_N <dbl>, REPORT <chr>, Date <date>,
## # Species <chr>, Fish_Abundance <dbl>
Fish_Habitat_Stats <- long_dat %>%
group_by(Species, HABITAT_TY) %>%
summarise(total = sum(Fish_Abundance),
mean = round(mean(Fish_Abundance), 3),
min = min(Fish_Abundance),
max = max(Fish_Abundance),
sd = sd(Fish_Abundance))
## `summarise()` has grouped output by 'Species'. You can override using the
## `.groups` argument.
Fish_Habitat_Stats
## # A tibble: 9 × 7
## # Groups: Species [3]
## Species HABITAT_TY total mean min max sd
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 CORVINA Estuarine 3 0.029 0 1 0.168
## 2 CORVINA Near-shore 16 0.059 0 5 0.450
## 3 CORVINA Pelagic 0 0 0 0 0
## 4 CROAKER Estuarine 33 0.317 0 10 1.48
## 5 CROAKER Near-shore 147 0.538 0 54 4.01
## 6 CROAKER Pelagic 0 0 0 0 0
## 7 TILAPIA Estuarine 11029 106. 0 1120 239.
## 8 TILAPIA Near-shore 44584 163. 0 2004 302.
## 9 TILAPIA Pelagic 0 0 0 0 0
ggplot(Fish_Habitat_Stats, aes(x = HABITAT_TY, y = total, fill = Species)) +
geom_col() +
geom_errorbar(aes(ymax = total + sd,
ymin = total - sd),
width = 0.7) +
ggtitle("Abundance of Fish at Different Habitate Types") +
ylab("Abundance of Fish") +
xlab("Habitat Type") +
theme_bw() +
theme(axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1, size = 7)) +
facet_wrap(. ~ Species, scales = "free")
No fish were caught from the pelagic habitat type. Most fish of each species were found closest to the Near-shore habitat type.
Species_based_CPUE <- long_dat %>%
group_by(Species) %>%
summarise(total_abundance = sum(Fish_Abundance),
total_hr = sum(NET_HRS),
CPUE = round(total_abundance/total_hr, 4),
sd = sd(Fish_Abundance/NET_HRS))
Species_based_CPUE
## # A tibble: 3 × 5
## Species total_abundance total_hr CPUE sd
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 CORVINA 19 10191. 0.0019 0.0156
## 2 CROAKER 180 10191. 0.0177 0.139
## 3 TILAPIA 55613 10191. 5.46 11.2
CPUE_Bar_Plot <- ggplot(data = Species_based_CPUE, aes(x = Species, y = CPUE, fill = Species)) +
geom_col() +
geom_errorbar(aes(ymax = CPUE + sd,
ymin = 0, # The error bars were limited to y = 0 because there are no negative amounts of fish that are caught
width = 0.7)) +
theme_bw() +
ggtitle("Species based Catch-Per-Unit-Effort") +
ylab("CPUE (fish caught per hr)")
CPUE_Bar_Plot
There is on average 5.5 tilapia over the course of the entire survey, while the corvinas and croakers were found more elusive.
Fish_Time_Line <- function(data, x = Date, y = "fish", title = "Fish Over Time"){
ggplot(data = dat, aes(x = Date, {{y}})) +
geom_smooth() + # This line shows the overall trend of the data with bounds of certainty
geom_line(alpha = 0.25) + # This line represents the raw data
theme_bw() +
labs(title = title) +
xlab("Date") +
ylab("Abundance") +
theme(
axis.title.x = element_text(size = 10),
axis.title.y = element_text(size = 10),
plot.title = element_text(size = 15))
}
Tilapia_Over_Time <- Fish_Time_Line(data = dat, x = PULL_Date, y = TILAPIA, title = "Tilapia Counts Over Time")
Tilapia_Over_Time
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
Corvina_Over_Time <- Fish_Time_Line(data = dat, x = Date, y = CORVINA, title = "Corvina Counts Over Time")
Corvina_Over_Time
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
Croaker_Over_Time <- Fish_Time_Line(data = dat, x = Date, y = CROAKER, title = "Croaker Counts Over Time")
Croaker_Over_Time
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
ggplot(long_dat, aes(x = Date, y = Fish_Abundance, color = Species, fill = Species)) +
geom_smooth() +
theme_bw() +
ggtitle("Abundance of Fish Over Time") +
ylab("Abundance of Fish")
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
The shear number of tilapia drowns out the croaker and corvina.
# Creating common titles for the title, x and y axes.
title.common <- textGrob("Species of Fish Over Time",
gp=gpar(fontsize=20))
y.common <- textGrob("Abundance",
gp=gpar(fontsize=15), rot=90)
x.common <- textGrob("Date",
gp=gpar(fontsize=15))
# Arranging the graphs in a grid, removing the y and x axis titles of the orginal graphs, and replacing them with the common y and x axis titles.
grid.arrange(arrangeGrob(Tilapia_Over_Time + ylab("") + xlab("") + ggtitle("Tilapia"),
Corvina_Over_Time + ylab("") + xlab("") + ggtitle("Corvina"),
Croaker_Over_Time + ylab("") + xlab("") + ggtitle("Croaker"),
ncol = 3,
top = title.common,
left = y.common,
bottom = x.common))
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
Tilapia have shown an increase in abundance over time. Corvina and croaker counts over time are similar in that they both had abundance observed at the beginning of the survey, but shortly after there were little to no observations of either species.
# Load in the shapefiles
shape_data <- st_read("~/Desktop/Universal Folder/Data Analysis Projects/Salton Sea [ds429]/Quarterly_Fishery_Surveys_-_Salton_Sea") |>
st_transform(3857) # Transform for compatibility with basemap tiles
## Reading layer `Quarterly_Fishery_Surveys_-_Salton_Sea_[ds428]' from data source
## `/Users/bodhi_hueffmeier/Desktop/Universal Folder/Data Analysis Projects/Salton Sea [ds429]/Quarterly_Fishery_Surveys_-_Salton_Sea'
## using driver `ESRI Shapefile'
## Simple feature collection with 409 features and 14 fields
## Geometry type: POINT
## Dimension: XY
## Bounding box: xmin: -12918980 ymin: 3913176 xmax: -12870890 ymax: 3964287
## Projected CRS: WGS 84 / Pseudo-Mercator
# Reading the column names of the shapefile
names(shape_data)
## [1] "OBJECTID" "PULL_DATE" "SITE" "HABITAT_TY" "NET_HRS"
## [6] "TILAPIA" "OTHER" "CORVINA" "SARGO" "CROAKER"
## [11] "COMMENTS" "UTM_E" "UTM_N" "REPORT" "geometry"
# Extract coordinates and bind to original data
shape_data_coords <- shape_data %>%
mutate(lon = st_coordinates(geometry)[,1],
lat = st_coordinates(geometry)[,2])
# Keep the first SITE for each unique location
unique_sites <- shape_data_coords %>%
group_by(lon, lat) %>%
slice(1) %>% # Keep only the first row for each point, to make the names unique
ungroup() %>%
mutate(SITE = trimws(gsub("\\(bottom\\) 1", "", SITE))) # Removes "(bottom) 1" and trims any leading/trailing white space. The removal of (bottom) 1 was necessary because there were other site names that indicated the depth at which each site was surveyed and for this map those addional names could clutter the map
# Plot all points, label only one per unique location
ggplot() +
annotation_map_tile(zoom = 10) +
geom_sf(data = shape_data, color = "blue", size = 2) +
geom_text_repel(data = unique_sites, # To add the unique site names to each point
aes(x = lon, y = lat, label = SITE),
size = 3) +
coord_sf(crs = st_crs(shape_data)) +
theme_minimal() +
labs(title = "Salton Sea Survey Stations")
## Zoom: 10
map_spcies_abundance <- function(data = shape_data_coords, fish_species = "TILAPIA", name = "fish_species Abundance", title = "Species of Fish Abundance at Salton Sea Sampling Stations"){
survey_sites <- shape_data_coords %>%
group_by(lon, lat) %>%
summarise(fish_species = sum({{fish_species}}, na.rm = TRUE),
geometry = first(geometry)) %>%
st_as_sf(crs = st_crs(shape_data))
ggplot() +
annotation_map_tile(zoom = 10) +
geom_sf(data = survey_sites, aes(color = fish_species), size = 3) +
geom_text_repel(data = unique_sites, # To add the unique site names to each point
aes(x = lon, y = lat, label = SITE),
size = 3) +
coord_sf(crs = st_crs(shape_data)) + # coord_sf() ensures that all layers use a common CRS.
scale_color_viridis_c(option = "C", name = name) +
theme_minimal() +
labs(title = title)
}
tilapia_abundance_map <- map_spcies_abundance(data = shape_data_coords, fish_species = TILAPIA, name = "Tilapia Abundance", title = "Tilapia Abundance at Salton Sea Sampling Stations")
## `summarise()` has grouped output by 'lon'. You can override using the `.groups`
## argument.
tilapia_abundance_map
## Zoom: 10
North Shore contained most of the tilapia observed and all of the basin locations contained the lowest counts.
corvina_abundance_map <- map_spcies_abundance(data = shape_data_coords, fish_species = CORVINA, name = "Corvina Abundance", title = "Corvina Abundance at Salton Sea Sampling Stations")
## `summarise()` has grouped output by 'lon'. You can override using the `.groups`
## argument.
corvina_abundance_map
## Zoom: 10
North Shore is also the site with the most abundance of another fish - corvina
croaker_abundance_map <- map_spcies_abundance(data = shape_data_coords, fish_species = CROAKER, name = "Croaker Abundance", title = "Croaker Abundance at Salton Sea Sampling Stations")
## `summarise()` has grouped output by 'lon'. You can override using the `.groups`
## argument.
croaker_abundance_map
## Zoom: 10
North Wister has the largest abundance of croaker found during the course of the study.
ggplot(long_dat, aes(x = SITE, y = Fish_Abundance, color = Species)) +
geom_point() +
theme_bw() +
ggtitle("Abundance of Fish At Different Sites") +
ylab("Abundance of Fish") +
xlab("Site Name") +
theme(axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1, size = 8))
The site that had the most fish observed was the South Salton City. All of the basins have shown the least abundance of fish across all species.
The Salton Sea is show shown a growth of tilapia over time, however, both croaker and corvina have seen a sharp decrease of abundance shortly after starting the study. Sargos were not observed at all during this study so they were not used in the analysis. Most of the fish can be found near the shores while the basin has shown the least amount of any of the species.